Goto

Collaborating Authors

 processing layer



Structured Token Retention and Computational Memory Paths in Large Language Models

Delena, Jonathan, Moreau, Augustin, Ravensdale, Dominic, Chatterton, Frederick

arXiv.org Artificial Intelligence

Memory retention mechanisms play a central role in determining the efficiency of computational architectures designed for processing extended sequences. Conventional methods for token management often impose fixed retention thresholds or rely on uniform attention weight distributions, leading to inefficient memory utilization and premature information loss in extended sequence modeling. Structured Token Retention (STR) introduces a probabilistic selection framework that dynamically adjusts token persistence based on contextual significance, ensuring that computational resources are allocated to semantically relevant elements. Computational Memory Paths (CMP) extend this framework through hierarchical memory allocation, refining retention efficiency through structured reallocation of token embeddings. Comparative assessments against baseline models demonstrate that STR and CMP improve token survival rates across long input sequences while reducing cumulative error propagation across processing layers. Experimental results further indicate reductions in computational overhead, improving inference speed without degrading contextual coherence. Token distribution analyses reveal that structured memory allocation prevents excessive redundancy in attention weight calculations, optimizing information retrieval efficiency in large-scale generative architectures. The integration of STR and CMP into an open-source model illustrates the adaptability of structured memory retention methodologies, highlighting their applicability in generative text processing, long-context comprehension, and scalable sequence modeling.


Optimal Architectures in a Solvable Model of Deep Networks

Neural Information Processing Systems

Deep neural networks have received a considerable attention due to the success of their training for real world machine learning applications. They are also of great interest to the understanding of sensory processing in cortical sensory hierarchies. The purpose of this work is to advance our theoretical understanding of the computational benefits of these architectures. Using a simple model of clustered noisy inputs and a simple learning rule, we provide analytically derived recursion relations describing the propagation of the signals along the deep network. By analysis of these equations, and defining performance measures, we show that these model networks have optimal depths. We further explore the dependence of the optimal architecture on the system parameters.


RoboSync: Efficient Real-Time Operating System for Social Robots with Customizable Behaviour

Tang, Cheng, Feng, Yijing, Hu, Yue

arXiv.org Artificial Intelligence

Traditional robotic systems require complex implementations that are not always accessible or easy to use for Human-Robot Interaction (HRI) application developers. With the aim of simplifying the implementation of HRI applications, this paper introduces a novel real-time operating system (RTOS) designed for customizable HRI - RoboSync. By creating multi-level abstraction layers, the system enables users to define complex emotional and behavioral models without needing deep technical expertise. The system's modular architecture comprises a behavior modeling layer, a machine learning plugin configuration layer, a sensor checks customization layer, a scheduler that fits the need of HRI, and a communication and synchronization layer. This approach not only promotes ease of use without highly specialized skills but also ensures real-time responsiveness and adaptability. The primary functionality of the RTOS has been implemented for proof of concept and was tested on a CortexM4 microcontroller, demonstrating its potential for a wide range of lightweight simple-to-implement social robotics applications.


Supervised GAN Watermarking for Intellectual Property Protection

Fei, Jianwei, Xia, Zhihua, Tondi, Benedetta, Barni, Mauro

arXiv.org Artificial Intelligence

We propose a watermarking method for protecting the Intellectual Property (IP) of Generative Adversarial Networks (GANs). The aim is to watermark the GAN model so that any image generated by the GAN contains an invisible watermark (signature), whose presence inside the image can be checked at a later stage for ownership verification. To achieve this goal, a pre-trained CNN watermarking decoding block is inserted at the output of the generator. The generator loss is then modified by including a watermark loss term, to ensure that the prescribed watermark can be extracted from the generated images. The watermark is embedded via fine-tuning, with reduced time complexity. Results show that our method can effectively embed an invisible watermark inside the generated images. Moreover, our method is a general one and can work with different GAN architectures, different tasks, and different resolutions of the output image. We also demonstrate the good robustness performance of the embedded watermark against several post-processing, among them, JPEG compression, noise addition, blurring, and color transformations.


Deep learning

#artificialintelligence

Deep Learning is a subset of machine learning in Artificial Intelligence that has networks capable of learning unsupervised from data that is unstructured or unlabeled. Also known as Deep Neural Learning or Deep Neural Network. Deep learning is a subfield of machine learning. Deep learning is a subfield of machine learning, which is the scientific study that gives computers the ability to learn without being explicitly programmed. Deep learning uses neural networks to learn representations of data.


Neural Networks -- What is it and Why does it matter?

#artificialintelligence

Neural networks are a set of algorithms, modeled loosely after the human brain, that is designed to recognize patterns. They interpret sensory data through a kind of machine perception, labeling, or clustering raw input. The patterns they recognize are numerical, contained in vectors, into which all real-world data, be it images, sound, text, or time series, must be translated. Neural networks can adapt to changing input. So the network generates the best possible result without needing to redesign the output criteria.


Unsupervised Clustering of Time Series Signals using Neuromorphic Energy-Efficient Temporal Neural Networks

Chaudhari, Shreyas, Nair, Harideep, Moura, José M. F., Shen, John Paul

arXiv.org Artificial Intelligence

Unsupervised time series clustering is a challenging problem with diverse industrial applications such as anomaly detection, bio-wearables, etc. These applications typically involve small, low-power devices on the edge that collect and process real-time sensory signals. State-of-the-art time-series clustering methods perform some form of loss minimization that is extremely computationally intensive from the perspective of edge devices. In this work, we propose a neuromorphic approach to unsupervised time series clustering based on Temporal Neural Networks that is capable of ultra low-power, continuous online learning. We demonstrate its clustering performance on a subset of UCR Time Series Archive datasets. Our results show that the proposed approach either outperforms or performs similarly to most of the existing algorithms while being far more amenable for efficient hardware implementation. Our hardware assessment analysis shows that in 7 nm CMOS the proposed architecture, on average, consumes only about 0.005 mm^2 die area and 22 uW power and can process each signal with about 5 ns latency.


Comprehensive TensorFlow.js Example

#artificialintelligence

First I will walk you through the app functionality and then will dive into implementation details. This app implements a business report execution time prediction use case (this time in JavaScript), which was explained in my previous post -- Report Time Execution Prediction with Keras and TensorFlow. For the model training, I'm using 50 epochs (data is processed in batches of 10) and the learning rate is set to 0.001. Neural Network is based on two processing layers and one output layer. Model is trained to forecast the expected wait time for business report execution.


Distributed Deep Convolutional Neural Networks for the Internet-of-Things

Disabato, Simone, Roveri, Manuel, Alippi, Cesare

arXiv.org Machine Learning

Due to the high demand in computation and memory, deep learning solutions are mostly restricted to high-performance computing units, e.g., those present in servers, Cloud, and computing centers. In pervasive systems, e.g., those involving Internet-of-Things (IoT) technological solutions, this would require the transmission of acquired data from IoT sensors to the computing platform and wait for its output. This solution might become infeasible when remote connectivity is either unavailable or limited in bandwidth. Moreover, it introduces uncertainty in the "data production to decision making"-latency, which, in turn, might impair control loop stability if the response should be used to drive IoT actuators. In order to support a real-time recall phase directly at the IoT level, deep learning solutions must be completely rethought having in mind the constraints on memory and computation characterizing IoT units. In this paper we focus on Convolutional Neural Networks (CNNs), a specific deep learning solution for image and video classification, and introduce a methodology aiming at distributing their computation onto the units of the IoT system. We formalize such a methodology as an optimization problem where the latency between the data-gathering phase and the subsequent decision-making one is minimized. The methodology supports multiple IoT sources of data as well as multiple CNNs in execution on the same IoT system, making it a general-purpose distributed computing platform for CNN-based applications demanding autonomy, low decision-latency, and high Quality-of-Service.